Accuracy prompts are a replicable and generalizable approach for reducing the spread of misinformation

Interventions that shift users attention toward the concept of accuracy represent a promising approach for reducing misinformation sharing online. We assess the replicability and generalizability of this accuracy prompt effect by meta-analyzing 20 experiments (with a total N = 26,863) completed by our group between 2017 and 2020. This internal meta-analysis includes all relevant studies regardless of outcome and uses identical analyses across all studies. Overall, accuracy prompts increased the quality of news that people share (sharing discernment) relative to control, primarily by reducing sharing intentions for false headlines by 10% relative to control in these studies. The magnitude of the effect did not significantly differ by content of headlines (politics compared with COVID-19 related news) and did not significantly decay over successive trials. The effect was not robustly moderated by gender, race, political ideology, education, or value explicitly placed on accuracy, but was significantly larger for older, more reflective, and more attentive participants. This internal meta-analysis demonstrates the replicability and generalizability of the accuracy prompt effect on sharing discernment.


Section 3. Re-analysis of ideology in the Twitter field experiment
Here we present an analysis of the role of ideology in the accuracy prompt Twitter field experiment of 11 . There was a total of N=5,379 users in the experiment, and their ideology was estimated based on the accounts they followed using the algorithm of Barbera et al. 25 . As shown in Figure S4, the users were overwhelmingly conservative

Supplementary Figure 4. Distribution of ideology scores for Twitter users in the field experiment.
Ideology was estimated based on the accounts they followed using the algorithm of Barbera et al.
Given the extreme left skew of the distribution of ideology scores, we follow the same approach used in 11 for handling extreme values and winsorize ideology at the lower 95 th percentile. We then look at the interaction between a "post-treatment" dummy and ideology when predicting the quality of news links shared by the users. We follow the main text models of 11 and analyze retweets without comment, include links to both opinion and non-opinion articles, and exclude data from the day on which a technical issue led to a randomization failure. We focus on the most straight-forward outcome measure, the average relative quality of links retweeted in a given user-day, and the model specification reported in the main text which includes wave fixed effects and calculates P values using Fisherian Randomization Inference (comparing t-statistics across 500 permutations). This model finds no significant interaction between ideology and the post-treatment dummy (pFRI = 0.97); the model is visualized in Figure S5. As shown in Table S2, we also find no significant interaction when using other model specifications, or the outcome measure of summed average quality of links retweeted.
Supplementary Figure 5. Average relative quality as a function of experimental condition across the spectrum of political ideology. Predicted average relative quality score based on ideology and pre-versus post-treatment in the Twitter field experiment.

Section 4. Moderation of accuracy prompt effect on discernment by ideology and partisanship considering only the Evaluation treatment.
Supplementary

Section 5. Meta-regression for individual differences
Supplementary Table 5. Coefficients and p values from meta-regressions predicting individual-level difference moderation using platform, news type, and baseline discernment. The left half of the table shows the coefficients when predicting the coefficient for the 3-way interaction between headline veracity, condition, and the individual difference -which captures the extent to which the individual difference moderates the treatment effect on sharing discernment. The right half of the table shows the coefficients when predicting the coefficient on the 2-way interaction between headline veracity and the individual difference -which captures how the individual difference relates to baseline sharing discernment in the control condition.
Moderation of accuracy prompt effect on discernment Relationship with baseline discernment in control Platform (1=Mturk)

Section 6. Including research from other groups
Here we consider the robustness of our main meta-analytic results to including accuracy prompt studies conducted by other groups that meet our inclusion criteria. Doing so yields only one additional eligible study, Roozenbeek et al. The coefficient on the interaction between condition and headline veracity and 95% confidence interval are shown for each study, and the metaanalytic estimate is shown with the red dotted line and blue diamond (positive values indicate that the treatment increased sharing discernment).